Managing use of lease resources allocated on fallover in a high availability computing environment

ABSTRACT

Responsive to a cluster manager for a particular node from among multiple nodes allocating at least one leased resource for a resource group for an application workload on the particular node, on fallover of the resource group from another node to the particular node, setting a timer thread, by the cluster manager for the particular node, to track an amount of time remaining for an initial lease period of the at least one leased resource. Responsive to the timer thread expiring while the resource group is holding the at least one leased resource, maintaining, by the cluster manager for the particular node, the resource group comprising the at least one leased resource for an additional lease period and automatically incurring an additional fee, only if the particular node has the capacity to handle the resource group at a lowest cost from among the nodes.

BACKGROUND

1. Technical Field

The embodiment of the invention relates generally to managing use oflease resources allocated on fallover in a high availability computingenvironment.

2. Description of Related Art

In some computing environments, it is important that the computingenvironment continue to handle application workloads even if one or moreresources handling the application workloads within the computingenvironment, fail. For a computing environment to continue to handleapplication workloads, even if one or more resources handling theapplication workloads within the computing environment fail, thecomputing environment may implement redundant computers in groups orclusters and implement a high availability controller that provides forautomated continued service to application workloads when systemcomponents within the computing environment fail. In one example,application workloads require one or more applications running on one ormore resources in a resource group. To provide high availability forapplications needed for application workloads, when system componentsfail or other conditions in the cluster change, the high availability(HA) controller detects when the conditions in the cluster change, andmoves the resource group for the workload to a standby node. Moving theresource group for the workload to a standby node includes configuringthe resources required for the resource group on the standby node andstarting the applications for the workload on the resource group on thestandby node.

For an HA controller to start applications on a standby node, the HAcontroller determines whether the standby node needs additionalprocessor, memory, and other hardware resources for the resource groupto handle the applications and configures the resource group on thestandby node with the required resources before starting the applicationon the standby node. In some computing environments, the HA controllercan dynamically add physical and logical resources to a standby node,such as by dynamically allocating CPUs and memory to a logical partitionon a node, to increase the hardware resources available for handlingapplication workloads moved over to the standby node.

In some computing systems, the resources that can be dynamicallyallocated to a standby node include on demand, lease resources, such asIBM®'s Capacity Upgrade on Demand (CUoD) resources (IBM is a trademarkof International Business Machines Corporation). CUoD resources arehardware resources that are preinstalled into a server to provideadditional capacity, such as additional CPU and memory, but are notactive until a client decides to enable the CUoD resources by acquiringa license to activate the CUoD resources, from a service provider, for alease period for a fee. The high availability controller or a userdetermines when to activate lease resources, such as for increasing theresources available to a standby node to handle a fallover of anapplication for a workload from a primary node.

BRIEF SUMMARY

When a HA controller is required to allocate lease resources for astandby node to move a resource group from a first node to a standbynode when conditions change within the cluster, to provide sufficientresources for the resource group, the standby node only has sufficientresources for the resource group if the lease resources are held in theresource group until the application workload on the resource group iscompleted. If the applications continue on the standby node even afterthe initial lease period for the lease resources concludes, anadditional fee is automatically incurred for the resource group holdingthe lease resources after the initial lease period expires. In view ofthe foregoing, there is a need for a method, computer system, andcomputer program product for the HA controller to continuously monitor,after lease resources are allocated to a resource group moved to astandby node, whether the resource group has released the leaseresources and the amount of time remaining in an initial lease periodfor lease resources, and to determine, when the initial lease periodexpires, whether to move the resource group to another node or maintainthe resource group on the node and holding the lease resources for anadditional lease period.

An embodiment of the invention provides a computer system for managinglease resources. The computer system comprises a cluster manager,coupled to at least one processor and memory, for a particular node fromamong a plurality of nodes communicatively connected through a network.The cluster manager is programmed, responsive to the cluster managerallocating at least one leased resource for a resource group for anapplication workload on the particular node on fallover of the resourcegroup from another node from among the plurality of nodes to theparticular node, to set a timer thread to track an amount of timeremaining for an initial lease period of the at least one leasedresource. The cluster manager is programmed, responsive to the timerthread expiring while the resource group is holding the at least oneleased resource, to maintain the resource group comprising the at leastone leased resource for an additional lease period and automaticallyincur an additional fee for use of the at least one leased resource bythe particular node only if the particular node has the capacity tohandle the resource group at a lowest cost from among the plurality ofnodes.

In another embodiment, a computer program product is provided formanaging resources. The computer program product comprises anon-transitory computer readable storage medium having programinstructions embodied therewith. The program instructions are executableby a computer to cause the computer to, responsive to a cluster managerfor a particular node from among a plurality of nodes communicativelyconnected through a network, allocating at least one leased resource fora resource group for an application workload on the particular node onfallover of the resource group from another node from among theplurality of nodes to the particular node, a timer thread to track anamount of time remaining for an initial lease period of the at least oneleased resource. The program instructions are executable by a computerto cause the computer to, responsive to the timer thread expiring whilethe resource group is holding the at least one leased resource, maintainthe resource group comprising the at least one leased resource for anadditional lease period and automatically incur, by the cluster managerfor the particular node, an additional fee, only if the particular nodehas the capacity to handle the resource group at a lowest cost fromamong the plurality of nodes.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The novel features believed characteristic of one or more embodiments ofthe invention are set forth in the appended claims. The one or moreembodiments of the invention itself however, will best be understood byreference to the following detailed description of an illustrativeembodiment when read in conjunction with the accompanying drawings,wherein:

FIG. 1 illustrates a block diagram of one example of one embodiment of ahigh availability (HA) computing environment in which a highavailability controller manages resource group fallover, where at leastone machine within the HA computing environment activates leaseresources for facilitating high availability for applications onresource group fallover;

FIG. 2 illustrates a block diagram of one example of a cluster manageron a node within a HA computing environment;

FIG. 3 illustrates a block diagram of one example of data structures foridentifying application and resource group requirements in an HAcomputing environment;

FIG. 4 illustrates a block diagram of one example of a standby node withCUoD resources activated and allocated to a resource group for anapplication workload on fallover;

FIG. 5 illustrates a block diagram of one example of an HA controllermanaging an application within a HA computing environment when the leaseperiod expires for CUoD resources allocated for to a resource group onfallover and still held by the resource group.

FIG. 6 illustrates a block diagram of one example of analyzed capacityresponses and decisions by a fallover controller, in response to theexpiration of a CUoD lease period for CUoD resources allocated onfallover of a resource group.

FIG. 7 illustrates one example of a schematic of a computer system inwhich the present invention may be implemented;

FIG. 8 illustrates a high level logic flowchart of a process and programfor monitoring use of CUoD resources on fallover of a resource group foran application workload in a HA computing environment;

FIG. 9 illustrates a high level logic flowchart of a process and programfor controlling a timer thread counting a lease period remaining forCUoD resources allocated on fallover of an application in a HA computingenvironment; and

FIG. 10 illustrates a high level logic flowchart of a process andprogram for a fallover controller checking a capacity of other nodes tohandle a resource group and deciding whether to fallover the resourcegroup to another node.

DETAILED DESCRIPTION

In the following description, for the purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of the present invention. It will be apparent, however, toone skilled in the art that the present invention may be practicedwithout these specific details. In other instances, well-knownstructures and devices are shown in block diagram form in order to avoidunnecessarily obscuring the present invention.

In addition, in the following description, for purposes of explanation,numerous systems are described. It is important to note, and it will beapparent to one skilled in the art, that the present invention mayexecute in a variety of systems, including a variety of computer systemsand electronic devices operating any number of different types ofoperating systems.

FIG. 1 illustrates a block diagram of one example of one embodiment of ahigh availability (HA) computing environment in which a highavailability controller manages resource group fallover, where at leastone machine within the HA computing environment activates leaseresources for facilitating high availability for applications onresource group fallover.

In the example, a high availability (HA) computing environment 100represents a computing environment in which multiple computing systemscommunicate with one another and interact to handle workloads, jobs, orother computational tasks via one or more network connections. In oneexample, while HA computing environment 100 includes multiple computingsystems, computing environment 100 may be view as a single system.

In one example, HA computing environment 100 includes one or morecomputing systems, viewed as multiple nodes, illustrated as a node 110,a node 120, and a node 130, in a cluster, such as a PowerHA®SystemMirror® cluster. In one example, each of node 110, node 120, andnode 130 is a processor that runs an operating system, a clustermanager, and one or more applications for handling workloads and eachnode may own a set of resources, including, but not limited to, disks,volume groups, file systems, networks, network addresses, andapplications. Each of node 110, node 120, and node 130 may include aseparate physical system, a separate logical system, or a separatevirtualized system, each of which may include one or more of one or moreserver systems, machines, or frames, the resources of each of which maybe partitioned into one or more logical partitions. In one example, HAcomputing environment 100 may include multiple System p® servers dividedinto logical partitions, RS/6000®, System i®, Blades, or System p®standalone systems, or a combination of these systems. Each of node 110,node 120, and node 130 may be connected through one or more networkconnections that enable each node to communicate, directly orindirectly, with at least one other node, including, but not limited tolocal area networks, wide area networks, wireless networks, and wirednetworks. One of ordinary skill in the art will appreciate that HAcomputing environment 100 may by implemented using multiples types ofdistributed computing environments.

In one example, each of node 110, node 120, and node 130 share one ormore sets of resources, including shared storage 120, which may includeone or more disks. In one example, shared storage 120 may include sharedconfiguration data 104 that includes one or more types of configurationinformation including, but not limited to, the hardware configurationand capacity of each node in HA computing environment 100, and resourcerequirements for configuration information for each application, eachresource group, and other workload components implemented within HAcomputing environment 100.

In the example, each of node 110, node 120, and node 130 may accessshared configuration data 104 and act as the central manager or primarynode for HA computing environment 100 to determine the capacity of thenodes within HA computing environment 100 to handle workloads, to handlechanging conditions in HA computing environment 100, and to managedistribution of workloads within HA computing environment. In anotherexample, in HA computing environment 100 a machine separate from node110, node 120, and node 130 may operate as the central manager. In oneexample, one or more of node 110, node 120, and node 130 may alsocommunicate with an interface controller 150, where interface controller150 provides an interface through which a client, or client layer, mayspecify configurations of each of node 110, node 120, and node 130 andspecify configuration data 104, and through which each of node 110, node120, and node 130 may send messages to the client.

In the embodiment, HA computing environment 100 provides highavailability for application workloads, when the conditions in HAcomputing environment 100 change, by providing automated fallover ofresources groups running application workloads, from one node to anothernode within HA computing environment 100. Examples of conditions in HAcomputing environment 100 changing include, but are not limited to, whena resource of a node running a resource group with an applicationworkload fails and when a node triggers a fallover of a resource groupwhen an initial lease period expires for activated, lease resources heldby the resource group.

In one example, HA computing environment 100 automates the process ofensuring the high availability of applications within HA computingenvironment 100 through a high availability controller implementedwithin HA computing environment 100, such as Power HA System Mirrorsoftware, through a cluster manager (CM) application instance running oneach node within HA computing environment 100. In the exampleillustrated, the high availability controller is implemented through CM112 on node 110, CM 122 on node 120, and CM 132 on node 130. Each of CM112, CM 122, and CM 132 may use shared storage 102 to facilitateefficient movement of resource groups from one node to another node andto access shared configuration information 104. In another example, asystem separate from the nodes may provide the high availabilitycontroller for managing failover. In other embodiments, HA computingenvironment 100 may include additional or alternate nodes andconfigurations of nodes.

CM 112, CM 122, and CM 132 may detect failures have occurred from one ormore messages including, but not limited to, an error message fromanother CM, an error message in shared configuration 104, or a CM notoutputting a heartbeat. In addition, in one example, node 110, node 120,and node 130 may be connected to one or more hardware managementconsoles (HMCs), such as HMC 140, where HMC 140 represents a controllerthat controls the physical allocation of hardware resources within HAcomputing environment 100, detects when hardware errors occur within HAcomputing environment 100, and sends messages to CM 112, CM 122, and CM132 with error information, such as when a machine within HA computingenvironment 100 is not responding to heartbeat requests. In otherembodiments, HA computing environment 100 may include additional oralternate controllers for detecting errors and passing error messageswithin HA computing environment 100. In the example, sharedconfiguration information 104 may specify a priority or policy for eachCM to use to determine how the CM should handle failure messages.

In particular, each of CM 112, CM 122, and CM 132 may dynamicallyallocate resources to a resource group on node 110, node 120, and node130, respectively. In one example, dynamic allocation of resources to aresource group on a node includes dynamically allocating resources to adynamic logical partition for a resource group on a node. In oneexample, applications fallover from one node to another node byallocating each resource group within one or more logical partitions onone node and dynamically moving one or more logical partitions, with theresource group requirements, from one node to another node. In oneexample, when the one or more logical partitions are moved from one nodeto another node, the resources required for the resource group aredynamically allocated to the logical partition on the new node and aninstance of the application is restarted on the resource group in thelogical partition. Resources dynamically allocated to the logicalpartition on the new node may include resources allocated from a freepool, where the free pool includes permanent resources for a node thatcan be dynamically allocated through HMC 140 to a logical partition, andmay include resources allocated from a CUoD pool, where the CUoD poolrepresents the CUoD resources that can be allocated once a license hasbeen acquired to activate the CUoD resources for a lease period.

In particular, dynamic allocation of resources to a resource group on anode may include dynamically allocating lease resources to a dynamiclogical partition for a resource group on a node. In one example, leaseresources, such as CUoD resources, are resources pre-installed on one ormore machines that are inactive and not allocable, until the leaseresources are activated for a lease period, in exchange for a fee. Inthe example, a CM activating lease resources acquires a license from aCUoD lease controller 152, for example, where the license specifies afee for use of the lease resources for an initial lease period and alsospecifies that if lease resources are not released prior to theexpiration of the initial lease period, that the lessee automaticallyincurs an additional fee, for an additional lease period. In particular,in the example, under the license, when an initial lease period for aleased resource expires, the leased resource does not automaticallychange from an active state to an inactive state. The client acquiringthe license to activate the lease resource must release the leaseresource, and may also be required to reset the lease resource to aninactive state, to end the license period. In the example, CM 122activates CUoD resources 128 on fallover but does not activate CUoDresources 129. In addition, in the example, CM 132 activates CUoDresources 138. In one example, CM 122 and CM 132, through HMC 140,manages activations of CUoD resources using acquired CUoD licenses andmanages deactivations of CUoD resources once the resources are releasedfrom resource group 124.

In one example, in HA computing environment 100, resource groups areplaced on a node at startup, fallover or fallback. Startup is theactivation of a resource group on a node or multiple nodes. Resourcegroup startup occurs during cluster startup or initial acquisition ofthe resource group on a node. Fallover is the movement of a resourcegroup from the node that currently owns the resource group to anotheractive node after the conditions on the node that currently owns theresource group change, such as the current node experiencing a failure.Fallback is the movement of a resource group from the node on which itcurrently resides to a node that is joining or reintegrating into thecluster based on a criteria.

In the example illustrated, at startup, a resource group 114 is startedon node 110, including an allocation of a minimum number of resourcesrequired for the applications for resource group 114 and at least oneinstance of an application started on resource group 114, illustrated asapplication A instance 116. In the example, shared configurationinformation 104 may specify the minimum resource requirements for eachapplication. In the example, node 110 may include sufficient resourcesto handle the resource requirements for the application workload for theduration of the workload.

In the example illustrated, node 110 fails, CM 122 detects the failureand initiates a fallover of resource group 114 by moving resource group114 to node 120, as resource group 124. Moving resource group 114 tonode 120, as resource group 124 includes CM 122 configuring resourcesfor resource group 124 to allocate the minimum number of resourcesrequired for the applications for resource group 124 and restarting theapplications on resource group 124, illustrated as application Ainstance 126. In the example, CM 122 activates CUoD resources 128 andallocates CUoD resources 128 to resource group 124. In the example wherea resource group is moved to another node on fallover, and leaseresources are activated and allocated to the resource group, such asCUoD resources 128, while the resource group on fallover initiallystarts with sufficient resources to handle application requirements,once the initial lease period for the CUoD resources expires, unless theresource group is moved to another node at the expiration of the initiallease period, if the resource group is still holding the leaseresources, an additional cost is automatically incurred for the resourcegroup to have sufficient resources to handle the application for anadditional lease period. In the example, to avoid incurring additionalcosts at the expiration of the initial lease period, CM 122 sets a timerthread to count the time remaining on the initial lease period for CUoDresources 128 and CM 122 monitors whether resource group 124 releasesCUoD resources 128. When the timer thread expires, if resource group 124is still holding CUoD resources 128, CM 122 determines whether any othernode has the capacity to handle the resource group at a lower cost thannode 120.

In the example, at the expiration of the CUoD lease period, CM 122decides to move resource group 124 to node 130, because CUoD resources138 are available for allocation and include additional time on theinitial lease period, therefore, node 130 can handle resource group 124at a lower cost than node 120. CM 122 initiates a fallover of resourcegroup 124 to node 130, as resource group 134. Moving resource group 124to node 130 as resource group 134 includes CM 132 configuring resourcesfor resource group 124 to allocate the minimum number of resourcesrequired for the applications for resource group 124 and restarting theapplications on resource group 124, illustrated as application Ainstance 126. In one example, node 130 may include CUoD resources 138with additional lease time remaining on the initial lease period becauseCUoD resources 138 were activated for a fallover, but the applicationworkload using CUoD resources 138 completes before the end of theinitial lease period and CUoD resources 138 are released to a free poolto be dynamically allocated to other resource groups or to bedeactivated at the expiration of the initial lease period. When CM 132allocates CUoD resources 138 to resource group 134, CM 132 sets a timerthread to count the time remaining on the initial lease period for CUoDresources 138 and CM 132 monitors whether resource group 134 releasesCUoD resources 138. When the timer thread expires, if resource group 134is still holding CUoD resources 138, CM 132 determines whether any othernode has the capacity to handle the resource group at a lower cost thannode 130.

In particular, in one example, by configuring node 120 and node 130 asstandby nodes with permanent, paid for resources limited to a minimumnumber of resources to run CM 122 and CM 132, respectively, but alsoincluding access to additional allocable resources from lease resources,node 120 and node 130 only use additional resources when necessary toprovide high availability to applications on fallover. A resource grouprunning on a node will, however, hold lease resources as long as anapplication workload is running on the resource group, regardless of thelength of the initial lease period for the lease resources. In addition,the license for a lease resource specifies that additional fees will beincurred if the lease resources are not released by the expiration ofthe initial lease period. Therefore, the CM allocating lease resourcesto resource groups when resource groups fallover to a node, needs tocontinuously monitor whether a resource group has released a leaseresource, track the time remaining on the lease period and determinewhether to move a resource group holding a leased resource to anothernode to reduce costs, when the initial lease period expires. The CMqueries other nodes to determine whether other nodes have sufficientpermanent resources to handle the resource group and whether other nodeshave allocable lease resources with time remaining on an initial leaseperiod.

In particular, in the example, once CUoD resources 128 are activated andallocated to resource group 124 on node 120 by HMC 140, CUoD resources128 are allocated to the one or more logical partitions for resourcegroup 124 until the requirements of application A instance 126 are metor until application A is moved to another resource group on node 120 oron another node. In particular, when application A instance 126 nolonger requires CUoD resources 128, resource group 124 returns CUoDresources 128 to a free pool and CM 122 may request to deactivate CUoDresources 128. In one example, CM 122 deactivates CUoD resources 128 bydirecting HMC 140 to return CUoD resources 128 to an inactive state andreturning a deactivation confirmation message to CUoD lease controller152 indicating the CUoD resources have been returned to an inactivestate.

Because CUoD resources 128 will be held by resource group 124 untilapplication A instance 126 no longer needs the resources, CUoD resources128 may be held by resource group 124 after the initial lease periodspecified in the CUoD license, expires, incurring additional fees forthe continued use of the CUoD resources, according to the terms of theCUoD license. To minimize the costs associated with activation of CUoDresources on failover, when CUoD resources are allocated to a resourcegroup on failover, CM 122 triggers a timer thread, set to the timeremaining for the lease period for the CUoD resources. CM 122 continuesto monitor the status of use of the CUoD resources and cancels the timerthread when the CUoD resources are released. If the timer thread countexpires, CM 122 determines whether there are other nodes, includingother resource groups on the node, that can handle the workload at alower cost than resource group 124, incurring additional fees for use ofCUoD resources 128 after the initial lease period. If CM 122 determinesthere are other nodes or other resource groups on the node that canhandle the workload at a lower cost than resource group 124, CM 122manages movement of the workload to another node. If CM 122 maintainsthe workload on node 120 and continues to hold CUoD resources 128, CM122 triggers a message for output via interface controller 150,indicating that an additional fee has been incurred for use of CUoDresources 128 after the initial lease period has expired.

FIG. 2 illustrates a block diagram of one example of a cluster manageron a node within a HA computing environment. In the example, CM 202,implemented on a node, such as CM 112, CM 122, or CM 132, includes aresource controller 204 for interfacing with an HMC and controllingresource allocations, including dynamic resource allocations to adynamic logical partition and migration of dynamic logical partitions.In the example, CM 202 includes a cluster communication controller 220for controlling communications with other nodes and components within HAcomputing environment 100. Although not depicted, CM 202 may include oneor more of a hypervisor or other middleware virtualization layer or maycommunicate with a hypervisor or other middleware virtualization layer,for managing logical partitions and other groupings of virtualizedresources.

In the example, CM 202 includes a fallover controller 224 for monitoringfor errors in HA computing environment 100 and managing fallover of aresource group to a node, and a CUoD timer manager 206 for controlling atimer for monitoring for the expiration of a lease period for CUoDresources allocated on fallover of a resource group, and still held bythe resource group. In the example, fallover controller 224 may detectan error message from another node or from HMC 140, indicating a failurerequiring fallover of a resource group. Fallover controller 224 controlsfallover of the resource group to the node, including, but not limitedto, controlling allocation of resources to logical partition for theresource group and restarting the application for the applicationworkloads on the resource group. Allocation of resources to a logicalpartition for the resource group may include fallover controller 224requesting activation of CUoD resources, to have sufficient resources toallocate to the resource group for the application.

In the example, on fallover of a resource group requiring an allocationof CUoD resources, fallover controller 224 triggers a timer thread 208.Timer thread 208 includes a resource group ID 209 of the resource groupholding the allocated CUoD resources and a counter 210 set to count anadjusted lease period. Fallover controller 224 monitors for a change inthe status of the allocated CUoD resources. If resource controller 204indicates the CUoD resources are released, such as by being returned toa free pool or deactivated, fallover controller 224 cancels timer thread208. If counter 210 on timer thread 208 expires, indicating that thelease period for the held CUoD resources is about to expire, timerthread 208 sends a message to fallover controller 224 indicating thetimer has expired. Fallover controller 224 receives expired timermessages and, in response, determines whether there are other nodes withthe capability to handle the resource group. In one example, fallovercontroller 224 accesses shared configuration 104 to determine whetherthere are other nodes configured with sufficient resources to handle theresource group. If fallover controller 224 determines there are othernodes configured with sufficient resources to handle the resource group,fallover controller 224 triggers a protocol to send capacity requeststhrough cluster communication controller 220 to the live nodes, andrecords the outgoing request in node communications 222. Clustercommunication controller 220 gathers responses to the capacity requestin node communications 222. Fallover controller 224 analyzes thecapacity responses for the capacity request, gathered in nodecommunications 222, and determines whether there is another node withthe capacity to handle the resource group at a lower cost than the costassociated with the current node handling the resource group. Iffallover controller 224 determines there is another node with thecapacity to handle the resource group at a lower cost than the costassociated with the current node handling the resource group, fallovercontroller 224 initiates movement of the resource group to the selectednode. If fallover controller 224 does not identify another node, andmaintains the resource group with the CUoD resources, fallovercontroller 224 initiates a message to a client indicating thatadditional fees are being incurred for use of the CUoD resources afterthe lease period expires.

FIG. 3 illustrates a block diagram of one example of data structures foridentifying application and resource group requirements in an HAcomputing environment. In the example, each of CM 112, CM 122, and CM132 may store a local copy of each of an application record 302 and aresource group record 320 and reference each of application record 302and resource group record 320 on fallover of a resource group. In theexample, application record 302 identifies an application controllername 304, with the name of an application, start and stop scripts 306for starting and stopping an application, and a resource group name 308,for identifying the resource group for the application. In the example,resource group record 320 includes a resource group name 310 foridentifying the resource group, a minimum resource requirement 312identifying the minimum resource requirements for the resource group,node names 314 identifying the nodes that can run the resource group,and a fallover policy 316 identifying whether the resource group ispermitted to fallover to another node. In the example, for fallovercontroller 224 to determine whether to fallover a resource group toanother node and if so, which node to use on fallover of a resourcegroup, the CM looks up resource group record 320, to determine from nodenames 314, which nodes are available for fallover of the resource group,and to determine from fallover policy 316, a priority for selectingamong the available nodes, and the minimum resource requirements for theresource group, from minimum resource 312. The node restarting anapplication on fallover references start and stop scripts 306 to startan application on the node and references start and stop scripts 306 tostop the application on the node when the application is complete.

FIG. 4 illustrates a block diagram of one example of a standby node withCUoD resources activated and allocated to a resource group for anapplication workload on fallover. In the example, a standby node, suchas node 120, prior to fallover, includes an LPAR allocated with 2 CPU,as illustrated at reference numeral 404, a free pool of 2 CPU availablefor dynamic allocation to LPAR 404, and 8 CPU accessible as inactiveCUoD resources 408, where inactive CUoD resources 408 require a CUoDlicense, acquired in exchange for a fee, for activation, prior toallocation. In the example, a resource group running application A needsto fallover to the standby node. As illustrated at reference numeral402, the requirement for the LPAR is 2 CPU and the requirements for theresource group for the application is 6 CPU, therefore 8 CPU will needto be allocated to the LPAR to handle both the LPAR requirement and theresource group requirement. In the example, for fallover of the resourcegroup, the standby node is configured, as illustrated at referencenumeral 410, with the LPAR configured with the 2 CPU originallyallocated to the LPAR and with a resource group 412 configured with 2CPU allocated from free pool 406 and 4 CPU activated and allocated fromCUoD resources 408. As illustrated at reference numeral 414, theinactivate CUoD resources are reduced from 8 CPUs to 4 CPUs, after theactivation and allocation of 4 CPUs.

In the example, the CUoD license lease period is for 20 hours, asillustrated at reference numeral 416. In the example, the fallovercontroller for the node starts a timer thread 418 set to an adjustedlease period of 19.9 hours. In one example, the time set in a timerthread is an adjusted lease period time sufficient to allow for adetermination whether to fallover the application to another node anddeactivate the CUoD resources, at the end of the lease period.

FIG. 5 illustrates one example of a block diagram of an HA controllermanaging an application within a HA computing environment when the leaseperiod expires for CUoD resources allocated for to a resource group onfallover and still held by the resource group. In the example, CM 122detects the expiration of a timer thread, such as the expiration oftimer thread 418. CM 122 determines whether any other nodes have thecapability to handle the resource group associated with the timer threadand if other nodes have the capability to handle the resource group,sends a capacity request to the CM for the other live nodes. In theexample, CM 112 is on a node that failed and is still not live and CM132 is on a node that is live. As illustrated at reference numeral 506,CM 122 sends a capacity query to CM 132. CM 132 responds with a capacityresponse, as illustrated at reference numeral 508. CM 122 collectscapacity responses and, as illustrated at reference numeral 510, decideswhether to (A) maintain the resource group on the node or to (B) movethe resource group to another node. If CM 122 decides to maintain theresource group, CM 122 triggers a cost message for output to the client,as illustrated at reference numeral 512, indicating that additional feesare being incurred for use of the CUoD resources beyond the lease periodper the CUoD license. If CM 122 decides to move the resource group, CM122 triggers a fallover of the resource group to another node, asillustrated at reference numeral 514.

FIG. 6 illustrates one example of a block diagram of analyzed capacityresponses and decisions by a fallover controller, in response to theexpiration of a CUoD lease period for CUoD resources allocated onfallover of a resource group. In the example, an application iscurrently running on a resource group with 6 CPUs on node 2, such asresource group 412, illustrated in FIG. 4, which includes four CUoD CPUsactivated and allocated at fallover and 2 additional CPUs allocated froma free pool. At reference numeral 610, the lease period for the fourCUoD CPUs activated at fallover is about to expire, node 2 sendscapacity requests to other nodes, and capacity responses illustrated,illustrate the resource capacity available on each node to allocate to anew resource group, where the resource group requires six CPUs. In theexample, at reference numeral 610, the resource group (RG) capacity fornode 1 is not determined because node 1 is still offline, the RGcapacity for node 2 includes four inactive CUoD CPU, and the RG capacityfor node 3 includes two CPUs available in a free pool and 4 inactiveCUoD CPU. In the example, node 2 decides to maintain the resource groupon node 2 and restart the timer thread for the CUoD resources for anadditional lease period. In particular, in the example, maintaining theresource group on node 2 would require incurring additional fees forholding the four activated, expired CUoD resources in the resource groupand moving the resource group to node 3 would require activating fourCUoD resources on node 3, therefore, unless the fee for the four CUoDresources on node 3 is less than the fee for the CUoD resources on node2, there would not be a cost benefit to moving the resource group tonode 3 and activating four new CUoD resources. In one example, where noadditional lease period is specified in the CUoD license, node 2 mayautomatically select a duration for an additional lease period. Inaddition, in the example, at reference numeral 610, a notificationmessage is triggered identifying that an additional fee is incurred forthe four CUoD CPUs.

In the example, at the next expiration of the timer on node 2,indicating the additional lease period for the CUoD resources on node 2has expired again, as illustrated at reference numeral 614, node 2receives capacity responses and determines whether to maintain theresource group at node 2 or move the resource group to another node. Inthe example, at reference numeral 614, the RG capacity for node 1 is notdetermined because node 1 is still offline, the RG capacity for node 2includes four inactive CUoD CPU, and the RG capacity for node 3 includesfour active CUoD CPUs available in a free pool with fifty hoursremaining and four inactive CUoD CPU. In the example, node 2 decides tomove the resource group to node 3 and a new timer is started on node 3.In particular, in the example, maintaining the resource group on node 2would require incurring additional fees for holding the four activated,expired CUoD resources in the resource group and moving the resourcegroup to node 3 would only require activating two CUoD resources, alongwith using the remaining time on the other CUoD resources alreadyactivated, therefore, it is more cost effective to move the resourcegroup to node 3 and activate two CUoD resources, rather than maintainthe resource group on node 2 and extend the lease on four CUoDresources. In the example, node 3 starts a timer thread with theshortest CUoD lease period set in the counter, which in the example isthe 50 hours remaining on the four active CUoD CPUs allocated from thefree pool. In one example, if multiple timers are set, a single threador multiple threads may be used to monitor timer expirations. Inaddition, in the example, at reference numeral 614, a notificationmessage is triggered identifying that a new fee is incurred for two CUoDCPUs on another node.

In the example, at the next expiration of the timer on node 3,indicating the additional lease period for the CUoD resources on node 3has expired, as illustrated at reference numeral 616, node 3 receivescapacity responses and determines whether to maintain the resource groupat node 3 or move the resource group to another node. In the example, atreference numeral 616, the RG capacity for node 1 is eight CPUsavailable in the free pool, the RG capacity for node 2 is eight inactiveCUoD CPUs, and the RG capacity for node 3 is two active CUoD CPU in thefree pool with twenty hours remaining and six inactive CUoD CPUs. In theexample, node 3 decides to move the resource group to node 1. Inparticular, in the example, maintaining the resource group on node 3would require incurring additional fees for holding the four activated,expired CUoD resources in the resource group and moving the resourcegroup to node 1 requires no CUoD resources. In the example, noadditional notification message is triggered because no additional feesare incurred.

FIG. 7 illustrates one example of a schematic of a computer system inwhich the present invention may be implemented. The present inventionmay be performed in a variety of systems and combinations of systems,made up of functional components, such as the functional componentsdescribed with reference to computer system 700 and may becommunicatively connected to a network, such as network 702. In oneexample, each of node 110, node 120, node 130, HMC 140, interfacecontroller 150, and CUoD lease controller 152 may each implement one ormore instances of functional components of computer system 700. Inanother example, computer system 700 may represent one or more cloudcomputing nodes.

Computer system 700 includes a bus 722 or other communication device forcommunicating information within computer system 700, and at least onehardware processing device, such as processor 712, coupled to bus 722for processing information. Bus 722 preferably includes low-latency andhigher latency paths that are connected by bridges and adapters andcontrolled within computer system 700 by multiple bus controllers. Whenimplemented as a server or node, computer system 700 may includemultiple processors designed to improve network servicing power. Wheremultiple processors share bus 722, additional controllers (not depicted)for managing bus access and locks may be implemented.

Processor 712 may be at least one general-purpose processor such as IBM®PowerPC® (IBM and PowerPC are registered trademarks of InternationalBusiness Machines Corporation) processor that, during normal operation,processes data under the control of software 750, which may include atleast one of application software, an operating system, middleware, andother code and computer executable programs accessible from a dynamicstorage device such as random access memory (RAM) 714, a static storagedevice such as Read Only Memory (ROM) 716, a data storage device, suchas mass storage device 718, or other data storage medium. Software 750may include, but is not limited to, code, applications, protocols,interfaces, and processes for controlling one or more systems within anetwork including, but not limited to, an adapter, a switch, a clustersystem, and a grid environment.

In one embodiment, the operations performed by processor 712 may controlthe operations of flowchart of FIGS. 8, 9, and 10 and other operationsdescribed herein. Operations performed by processor 712 may be requestedby software 750 or other code or the steps of one embodiment of theinvention might be performed by specific hardware components thatcontain hardwired logic for performing the steps, or by any combinationof programmed computer components and custom hardware components.

Those of ordinary skill in the art will appreciate that aspects of oneembodiment of the invention may be embodied as a system, method orcomputer program product. Accordingly, aspects of one embodiment of theinvention may take the form of an entirely hardware embodiment, anentirely software embodiment (including firmware, resident software,micro-code, etc.) or an embodiment containing software and hardwareaspects that may all generally be referred to herein as “circuit,”“module,” or “system.” Furthermore, aspects of one embodiment of theinvention may take the form of a computer program product embodied inone or more tangible computer readable medium(s) having computerreadable program code embodied thereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk,such as mass storage device 718, a random access memory (RAM), such asRAM 714, a read-only memory (ROM) 716, an erasable programmableread-only memory (EPROM or Flash memory), an optical fiber, a portablecompact disc read-only memory (CDROM), an optical storage device, amagnetic storage device, or any suitable combination of the foregoing.In the context of this document, a computer readable storage medium maybe any tangible medium that can contain or store a program for use by orin connection with an instruction executing system, apparatus, ordevice.

A computer readable signal medium may include a propagated data signalwith the computer readable program code embodied therein, for example,in baseband or as part of a carrier wave. Such a propagated signal maytake any of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction executable system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to, wireless,wireline, optical fiber cable, radio frequency (RF), etc., or anysuitable combination of the foregoing.

Computer program code for carrying out operations of on embodiment ofthe invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, such as computer system 700, partly on the user'scomputer, as a stand-alone software package, partly on the user'scomputer and partly on a remote computer or entirely on the remotecomputer or server. In the latter scenario, the remote computer may beconnected to the user's computer through any type of network, such asnetwork 702, through a communication interface, such as networkinterface 532, over a network link that may be connected, for example,to network 702.

In the example, network interface 732 includes an adapter 734 forconnecting computer system 700 to network 702 through a link. Althoughnot depicted, network interface 732 may include additional software,such as device drivers, additional hardware and other controllers thatenable communication. When implemented as a server, computer system 700may include multiple communication interfaces accessible via multipleperipheral component interconnect (PCI) bus bridges connected to aninput/output controller, for example. In this manner, computer system700 allows connections to multiple clients via multiple separate portsand each port may also support multiple connections to multiple clients.

One embodiment of the invention is described below with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. Those of ordinary skill in the art will appreciate that eachblock of the flowchart illustrations and/or block diagrams, andcombinations of blocks in the flowchart illustrations and/or blockdiagrams, can be implemented by computer program instructions. Thesecomputer program instructions may be provided to a processor of ageneral purpose computer, special purpose computer, or otherprogrammable data processing apparatus to produce a machine, such thatthe instructions, which execute via the processor of the computer orother programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

These computer program instructions may also be stored in acomputer-readable medium that can direct a computer, such as computersystem 700, or other programmable data processing apparatus to functionin a particular manner, such that the instructions stored in thecomputer-readable medium produce an article of manufacture includinginstruction means which implement the function/act specified in theflowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer,such as computer system 700, or other programmable data processingapparatus to cause a series of operational steps to be performed on thecomputer or other programmable apparatus to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

Network interface 732, the network link to network 702, and network 702may use electrical, electromagnetic, or optical signals that carrydigital data streams. The signals through the various networks and thesignals on network 702, the network link to network 702, and networkinterface 732 which carry the digital data to and from computer system700, may be forms of carrier waves transporting the information.

In addition, computer system 700 may include multiple peripheralcomponents that facilitate input and output. These peripheral componentsare connected to multiple controllers, adapters, and expansion slots,such as input/output (I/O) interface 726, coupled to one of the multiplelevels of bus 722. For example, input device 724 may include, forexample, a microphone, a video capture device, an image scanning system,a keyboard, a mouse, or other input peripheral device, communicativelyenabled on bus 722 via I/O interface 726 controlling inputs. Inaddition, for example, output device 720 communicatively enabled on bus722 via I/O interface 726 for controlling outputs may include, forexample, one or more graphical display devices, audio speakers, andtactile detectable output interfaces, but may also include other outputinterfaces. In alternate embodiments of the present invention,additional or alternate input and output peripheral components may beadded.

Those of ordinary skill in the art will appreciate that the hardwaredepicted in FIG. 7 may vary. Furthermore, those of ordinary skill in theart will appreciate that the depicted example is not meant to implyarchitectural limitations with respect to the present invention.

FIG. 8 illustrates a high level logic flowchart of a process and programfor monitoring use of CUoD resources on fallover of a resource group foran application workload in a HA computing environment. In the example,the process starts at block 800 and thereafter proceeds to block 802.Block 802 illustrates a determination by a fallover controller whether aresource group fallover requires a CUoD resource allocation for theresource group. In the example, if a resource group fallover requires aCUoD resource allocation for the resource group, then the process passesto block 804. Block 804 illustrates recording the resource ID for theCUoD resources associated with the resource group ID. Next, block 806illustrates calculating an adjusted CUoD lease period by reducing theCUoD lease period by a reporting period. Thereafter, block 808illustrates triggering a CUoD timer thread with the adjusted CUoD leaseperiod and the resource group ID. Next, block 810 illustrates recordingthe timer thread ID in associated with the resource group ID.Thereafter, block 812 illustrates monitoring the status of the resourcegroup ID, and the process passes to block 814.

Block 814 illustrates a determination of whether the CUoD resources havebeen released by the resource group. If the CUoD resources have beenreleased by the resource group, then the process passes to block 822. Ifthe CUoD resources have not been released by the resource group, thenthe process passes to block 816.

Block 816 illustrates a determination whether the fallover controllerreceives an expired timer message for the resource group ID. If thefallover controller does not receive an expired timer message for theresource group ID, then the process passes to block 824. Block 824illustrates a determination whether a CUoD lease time is updated. Atblock 824, if a CUoD lease time is not updated, then the process passesto block 814. At block 824, if a CUoD lease time is updated, then theprocess passes to block 826. Block 826 illustrates sending an adjusted,updated lease time to the timer thread ID for the resource group ID, andthe process passes to block 806.

Returning to block 816, if the fallover controller does receive anexpired timer message for the resource group ID, then the process passesto block 818. Block 818 illustrates triggering a capacity check. Next,block 820 illustrates a determination whether the fallover controllermaintains the resource group on the node. If the fallover controllermaintains the resource group on the node, then the process returns toblock 806. If the fallover controller does not maintain the resourcegroup on the node, then the process passes to block 822.

FIG. 9 illustrates a high level logic flowchart of a process and programfor controlling a timer thread counting a lease period remaining forCUoD resources allocated on fallover of an application in a HA computingenvironment. In the example, the process starts at block 900 andthereafter proceeds to block 902. Block 902 illustrates a determinationwhether a new timer thread is created with a lease period for a resourcegroup ID. If a new timer thread is created, the process passes to block904. Block 904 illustrates setting a counter to count the lease periodfor the resource group ID. Next, block 906 illustrates starting thecounter. Thereafter, block 910 illustrates a determination whether thecounter is expired. At block 910, if the counter expires, then theprocess passes to block 912. Block 912 illustrates sending an expiredtimer message to the timer controller with the resource group ID, andthe process ends. Returning to block 910, if the counter has notexpired, then the process passes to block 914. Block 914 illustrates adetermination whether a counter update is received. If a counter updateis received, then the process passes to block 916. Block 916 illustratesupdating the counter with the counter update value, and the processends.

FIG. 10 illustrates a high level logic flowchart of a process andprogram for a fallover controller checking a capacity of other nodes tohandle a resource group and deciding whether to fallover the resourcegroup to another node. In the example, the process starts at block 1000and thereafter proceeds to block 1002. Block 1002 illustrates adetermination whether a capacity check is triggered. If a capacity checkis triggered, then the process passes to block 1004. Block 1004illustrates identifying the resource requirements of the resource grouptriggering the capacity check, and the process passes to block 1006.

Block 1006 illustrates a determination whether any other nodes have thecapability to handle the resource requirements for the resource group.At block 1006, if no other nodes have the capability to handle theresource requirements for the resource group, then the process passes toblock 1020. Block 1020 illustrates maintaining the resource group on thecurrent node. Next, block 1022 illustrates initiating a messageindicating a fee has been incurred for an additional lease period forthe CUoD resources. Thereafter, block 1024 illustrates determining thenext lease period for the CUoD resources from the CUoD license, and theprocess ends.

Returning to block 1006, if other nodes have the capability to handlethe resource requirements for the resource group, then the processpasses to block 1008. Block 1008 illustrates initiating a capacityrequest protocol to all the nodes with the capability to handle theresource requirements. Next, block 1010 illustrates gathering capacityresponses from the other nodes. Thereafter, block 1012 illustratescalculating a cost for each node, with capacity, to handle the resourcegroup, and the process passes to block 1014.

Block 1014 illustrates a determination whether there is any other nodewith the capacity to handle the resource group at a lower cost than thecurrent node handling the resource group, including the expired CUoDresources. At block 1014, if there is not another node with the capacityto handle the resource group at a lower cost, then the process passes toblock 1020. At block 1014, if there is another node with the capacity tohandle the resource group at a lower cost, then the process passes toblock 1016. Block 1016 illustrates trigger a fallover of the resourcegroup to a selected another node able to handle the resource group at alower cost. Next, block 1018 illustrates initiating a message indicatingthe resource group has been moved to the other node and any additionalfee incurred for a lease period on activated CUoD resources on the othernode, and the process ends.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, occur substantiallyconcurrently, or the blocks may sometimes occur in the reverse order,depending upon the functionality involved. It will also be noted thateach block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention. Asused herein, the singular forms “a”, “an” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising”, when used in this specification specify thepresence of stated features, integers, steps, operations, elements,and/or components, but not preclude the presence or addition of one ormore other features, integers, steps, operations, elements, components,and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements in the claims below are intended toinclude any structure, material, or act for performing the function incombination with other claimed elements as specifically claimed. Thedescription of the one or more embodiments of the invention has beenpresented for purposes of illustration and description, but is notintended to be exhaustive or limited to the invention in the formdisclosed. Many modifications and variations will be apparent to thoseof ordinary skill in the art without departing from the scope and spiritof the invention. The embodiment was chosen and described in order tobest explain the principles of the invention and the practicalapplication, and to enable others of ordinary skill in the art tounderstand the invention for various embodiments with variousmodifications as are suited to the particular use contemplated.

While the invention has been particularly shown and described withreference to one or more embodiments, it will be understood by thoseskilled in the art that various changes in form and detail may be madetherein without departing from the spirit and scope of the invention.

What is claimed is:
 1. A computer system for managing resources, thecomputer system comprising: a cluster manager, coupled to at least oneprocessor and memory, for a particular node from among a plurality ofnodes communicatively connected through a network; the cluster managerprogrammed to: responsive to the cluster manager allocating at least oneleased resource for a resource group for an application workload on theparticular node on fallover of the resource group from another node fromamong the plurality of nodes to the particular node, set a timer threadto track an amount of time remaining for an initial lease period of theat least one leased resource; and responsive to the timer threadexpiring while the resource group is holding the at least one leasedresource, maintain the resource group comprising the at least one leasedresource for an additional lease period and automatically incur anadditional fee for use of the at least one leased resource by theparticular node only if the particular node has the capacity to handlethe resource group at a lowest cost from among the plurality of nodes.2. The computer system according to claim 1, the cluster manager furtherprogrammed to: activate the at least one leased resource from aninactive resource to an active resource by acquiring a license for theat least one leased resource, wherein the cluster manager is onlyenabled to allocate the at least one leased resource as an activeresource, wherein the license for the at least one leased resourcespecifies an initial fee for the initial lease period and the additionalfee for the additional lease period, wherein the license for the atleast one leased resource comprises an agreement to pay the additionalfee for the additional lease period if the at least one leased resourceis still held by the resource group at the expiration of the initiallease period; and allocate the at least one leased resource to theresource group, wherein the resource group holds the at least one leasedresource in the resource group until at least one of the applicationworkload has completed or the resource group is moved to an availablenode from among the plurality of nodes.
 3. The computer system accordingto claim 1, the cluster manager further programmed to: receive a messageindicating an error on the another node; decide to trigger a fallover ofthe resource group from the another node to the particular node; andmanage fallover of the resource group from the another node to theparticular node by allocating a required minimum selection of resourcesto the resource group, where the required minimum selection of resourcescomprises the at least one leased resource, and restarting theapplication workload on the allocated required minimum selection ofresources of the resource group.
 4. The computer system according toclaim 1, the cluster manager further programmed to: detect aninsufficient number of resources available for allocation for theresource group on the particular node, wherein the resources availablefor allocation do not comprise the at least one leased resource; acquirea license for leasing the at least one leased resource, in exchange fora fee, wherein the license requires that if the at least one leasedresource is not released prior to the end of the first lease period thenan additional fee is automatically incurred for the additional leaseperiod; and activate the at least one leased resource from an inactivestate to an active state using the license.
 5. The computer systemaccording to claim 1, the cluster manager further programmed to:communicatively connect with a plurality of other cluster managers onthe other nodes from among the plurality of nodes, wherein the pluralityof nodes share at least one storage resource.
 6. The computer systemaccording to claim 1, the cluster manager further programmed to:responsive to setting the timer thread, monitor whether the resourcegroup releases the at least one leased resource; and responsive todetecting the resource group releasing the at least one leased resource,cancel the timer thread.
 7. The computer system according to claim 1,the cluster manager further programmed to: detect the timer threadexpiring while the resource group is holding the at least one leasedresource; determine whether there is a selection of at least one othernode from among the plurality of nodes with capability to handle one ormore resource group requirements for the resource group; responsive todetecting no selection of at least one other node with capability tohandle the resource group requirements, maintain the resource group onthe particular node comprising the at least one leased resource for theadditional lease period; responsive to detecting the selection of the atleast one other node with capability to handle the resource grouprequirements, send a capacity request to a separate cluster manager oneach of the selection of the at least one other node; receive a separatecapacity response to each capacity request from each separate clustermanager, wherein each separate capacity response comprises a separatecurrent capacity from one of the selection of the at least other node ofthe resources available for allocation; calculate a particular costassociated with the particular node continuing to handle the resourcegroup and a separate cost associated with the selection of the at leastone other node handling the resource group with each separate currentcapacity; responsive to the cluster manager calculating the particularcost is the lowest cost from among the particular cost and each separatecost, maintain the resource group on the particular node comprising theat least one leased resource for the additional lease period; andresponsive to the cluster manager calculating another cost from amongeach separate cost is the lowest cost from among the particular cost andeach separate cost, trigger fallover of the resource group from theparticular node to a next node associated with the another cost fromamong the selection of the at least one other node and deactivating theat least one leased resource.
 8. The computer system according to claim1, the cluster manager further programmed to: responsive to maintainingthe resource group on the particular node comprising the at least oneleased resource for the additional lease period, trigger a message to aclient interface indicating the lease of the at least one leasedresource has been automatically extended for the additional lease periodfor the additional fee.
 9. A computer program product for managingresources, the computer program product comprising a non-transitorycomputer readable storage medium having program instructions embodiedtherewith, the program instructions executable by a computer to causethe computer to: responsive to a cluster manager, for a particular nodefrom among a plurality of nodes communicatively connected through anetwork, allocating at least one leased resource for a resource groupfor an application workload on the particular node on fallover of theresource group from another node from among the plurality of nodes tothe particular node, set a timer thread to track an amount of timeremaining for an initial lease period of the at least one leasedresource; and responsive to the timer thread expiring while the resourcegroup is holding the at least one leased resource, maintain the resourcegroup comprising the at least one leased resource for an additionallease period and automatically incur, by the cluster manager for theparticular node, an additional fee, only if the particular node has thecapacity to handle the resource group at a lowest cost from among theplurality of nodes.
 10. The computer program product according to claim9, further comprising the program instructions executable by a processorto cause the processor to: activate the at least one leased resourcefrom an inactive resource to an active resource by acquiring a licensefor the at least one leased resource, wherein the cluster manager isonly enabled to allocate the at least one leased resource as an activeresource, wherein the license for the at least one leased resourcespecifies an initial fee for the initial lease period and the additionalfee for the additional lease period, wherein the license for the atleast one leased resource comprises an agreement to pay the additionalfee for the additional lease period if the at least one leased resourceis still held by the resource group at the expiration of the initiallease period; and allocate the at least one leased resource to theresource group, wherein the resource group holds the at least one leasedresource in the resource group until at least one of the applicationworkload has completed or the resource group is moved to an availablenode from among the plurality of nodes.
 11. The computer program productaccording to claim 9, further comprising the program instructionsexecutable by a processor to cause the processor to: receive a messageindicating an error on the another node; decide to trigger a fallover ofthe resource group from the another node to the particular node; andmanage fallover of the resource group from the another node to theparticular node by allocating a required minimum selection of resourcesto the resource group, where the required minimum selection of resourcescomprises the at least one leased resource, and restarting theapplication workload on the allocated required minimum selection ofresources of the resource group.
 12. The computer program productaccording to claim 9, further comprising the program instructionsexecutable by a processor to cause the processor to: detect aninsufficient number of resources available for allocation for theresource group on the particular node, wherein the resources availablefor allocation do not comprise the at least one leased resource; acquirea license for leasing the at least one leased resource, in exchange fora fee, wherein the license requires that if the at least one leasedresource is not released prior to the end of the first lease period thenan additional fee is automatically incurred for the additional leaseperiod; and activate the at least one leased resource from an inactivestate to an active state using the license.